A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models. (arXiv:2103.03335v3 [cs.IR] UPDATED)
(2 min)
Due to high annotation costs making the best use of existing human-created
training data is an important research direction. We, therefore, carry out a
systematic evaluation of transferability of BERT-based neural ranking models
across five English datasets. Previous studies focused primarily on zero-shot
and few-shot transfer from a large dataset to a dataset with a small number of
queries. In contrast, each of our collections has a substantial number of
queries, which enables a full-shot evaluation mode and improves reliability of
our results. Furthermore, since source datasets licences often prohibit
commercial use, we compare transfer learning to training on pseudo-labels
generated by a BM25 scorer. We find that training on pseudo-labels -- possibly
with subsequent fine-tuning using a modest number of annotated queries -- can
produce a competitive or better model compared to transfer learning. Yet, it is
necessary to improve the stability and/or effectiveness of the few-shot
training, which, sometimes, can degrade performance of a pretrained model.